Introduction to R

ResBaz Victoria 2024

Introducing R

R is a language and an environment for working with data.


We will primarily work with data by writing R code.


R has a large community of users and developers,
and many specialized packages.

Why code?

If every step of your analysis is recorded in an R script, with no manual steps:

  • you have a complete record of what you have done
  • changes easily tested, poor early decisions easily fixed
  • today’s big project becomes a function in a package, serves as tomorrow’s building block

Writing code is an essential part of reproducible research.

  • other researchers can precisely understand and verify your work


R is open-source and free, so others can use your code without any barriers.

Data analysis follows a script

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Data analysis follows a script

A self-portrait by Chat GPT

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Model here is intended to cover a broad range of tasks:

  • Summarize data with counts, means, etc.
  • More generally “fit a model” to the data.
    • Traditional statistical models.
    • Machine learning models →
  • Using the model, perform statistical tests.

Data analysis follows a script

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Modelling is enabled and informed by the other steps!

  • Visualization is critical to identify problems or make sure you are asking the right question.

  • You need to load and tidy and maybe transform your data to plot and model it.

  • Finally, you should communicate your results.

(workshop)

Conclusion

We’ve had a taste of the workflow in R. We’ve covered loading, touched on tidying, done some visualization and a little modelling (or at least summarization).


You still need to communicate your results, with your colleagues or the wider world! Quarto can help with this.


Learning programming in R will super-charge your abilities. Writing your own functions, loops, packages, …